AITopics | regression 0

Collaborating Authors

regression 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimensionality Reduction on IoT Monitoring Data of Smart Building for Energy Consumption Forecasting

Koutras, Konstantinos, Bompotas, Agorakis, Halkiopoulos, Constantinos, Kalogeras, Athanasios, Alexakos, Christos

arXiv.org Artificial IntelligenceNov-4-2025

The Internet of Things (IoT) plays a major role today in smart building infrastructures, from simple smart-home applications, to more sophisticated industrial type installations. The vast amounts of data generated from relevant systems can be processed in different ways revealing important information. This is especially true in the era of edge computing, when advanced data analysis and decision-making is gradually moving to the edge of the network where devices are generally characterised by low computing resources. In this context, one of the emerging main challenges is related to maintaining data analysis accuracy even with less data that can be efficiently handled by low resource devices. The present work focuses on correlation analysis of data retrieved from a pilot IoT network installation monitoring a small smart office by means of environmental and energy consumption sensors. The research motivation was to find statistical correlation between the monitoring variables that will allow the use of machine learning (ML) prediction algorithms for energy consumption reducing input parameters. For this to happen, a series of hypothesis tests for the correlation of three different environmental variables with the energy consumption were carried out. A total of ninety tests were performed, thirty for each pair of variables. In these tests, p-values showed the existence of strong or semi-strong correlation with two environmental variables, and of a weak correlation with a third one. Using the proposed methodology, we manage without examining the entire data set to exclude weak correlated variables while keeping the same score of accuracy.

data mining, energy consumption, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISC257844.2023.10293689

2506.22468

Country: Europe > Greece > West Greece > Patra (0.06)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Energy (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Multilingual Hope Speech Detection: A Comparative Study of Logistic Regression, mBERT, and XLM-RoBERTa with Active Learning

Abiola, T. O., Abiodun, K. D., Olumide, O. E., Adebanji, O. O., Calvo, O. Hiram, Sidorov, Grigori

arXiv.org Artificial IntelligenceSep-25-2025

Hope speech language that fosters encouragement and optimism plays a vital role in promoting positive discourse online. However, its detection remains challenging, especially in multilingual and low-resource settings. This paper presents a multilingual framework for hope speech detection using an active learning approach and transformer-based models, including mBERT and XLM-RoBERTa. Experiments were conducted on datasets in English, Spanish, German, and Urdu, including benchmark test sets from recent shared tasks. Our results show that transformer models significantly outperform traditional baselines, with XLM-RoBERTa achieving the highest overall accuracy. Furthermore, our active learning strategy maintained strong performance even with small annotated datasets. This study highlights the effectiveness of combining multilingual transformers with data-efficient training strategies for hope speech detection.

artificial intelligence, machine learning, speech detection, (18 more...)

arXiv.org Artificial Intelligence

2509.20315

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
South America (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Aleks: AI powered Multi Agent System for Autonomous Scientific Discovery via Data-Driven Approaches in Plant Science

Jin, Daoyuan, Gunner, Nick, Janke, Niko Carvajal, Baruah, Shivranjani, Gold, Kaitlin M., Jiang, Yu

arXiv.org Artificial IntelligenceAug-28-2025

Modern plant science increasingly relies on large, heterogeneous datasets, but challenges in experimental design, data preprocessing, and reproducibility hinder research throughput. Here we introduce Aleks, an AI-powered multi-agent system that integrates domain knowledge, data analysis, and machine learning within a structured framework to autonomously conduct data-driven scientific discovery. Once provided with a research question and dataset, Aleks iteratively formulated problems, explored alternative modeling strategies, and refined solutions across multiple cycles without human intervention. In a case study on grapevine red blotch disease, Aleks progressively identified biologically meaningful features and converged on interpretable models with robust performance. Ablation studies underscored the importance of domain knowledge and memory for coherent outcomes. This exploratory work highlights the promise of agentic AI as an autonomous collaborator for accelerating scientific discovery in plant sciences.

agent, alek, artificial intelligence, (16 more...)

arXiv.org Artificial Intelligence

2508.19383

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > California > Sacramento County > Vineyard (0.04)
North America > United States > California > Napa County (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.88)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning

Kapure, Nachiket, Joshi, Harsh, Mistri, Rajeshwari, Kumari, Parul, Mali, Manasi, Purohit, Seema, Sharma, Neha, Panday, Mrityunjoy, Yajnik, Chittaranjan S.

arXiv.org Artificial IntelligenceFeb-20-2025

Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies, supervised feature selection techniques, and predictive modeling. Given the constraints of the dataset, the research strengthens the role of data preprocessing in improving the model performance. Among the various methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. Beyond model performance, the study highlights the clinical significance of key physiological determinants, offering insights into maternal and fetal health factors that influence birth weight, offering insights that extend over statistical modeling. By bridging computational intelligence with perinatal research, this work underscores the transformative role of machine learning in enhancing predictive accuracy, refining risk assessment and informing data-driven decision-making in maternal and neonatal care. Keywords: Birth weight prediction, maternal-fetal health, MICE, BART, Gradient Boosting, neonatal outcomes, Clinipredictive.

birth weight, dataset, regression 0, (13 more...)

arXiv.org Artificial Intelligence

2502.1427

Country:

Asia > India (0.04)
Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Computing Gram Matrix for SMILES Strings using RDKFingerprint and Sinkhorn-Knopp Algorithm

Ali, Sarwan, Mansoor, Haris, Chourasia, Prakash, Khan, Imdad Ullah, Patterson, Murray

arXiv.org Artificial IntelligenceDec-19-2024

In molecular structure data, SMILES (Simplified Molecular Input Line Entry System) strings are used to analyze molecular structure design. Numerical feature representation of SMILES strings is a challenging task. This work proposes a kernel-based approach for encoding and analyzing molecular structures from SMILES strings. The proposed approach involves computing a kernel matrix using the Sinkhorn-Knopp algorithm while using kernel principal component analysis (PCA) for dimensionality reduction. The resulting low-dimensional embeddings are then used for classification and regression analysis. The kernel matrix is computed by converting the SMILES strings into molecular structures using the Morgan Fingerprint, which computes a fingerprint for each molecule. The distance matrix is computed using the pairwise kernels function. The Sinkhorn-Knopp algorithm is used to compute the final kernel matrix that satisfies the constraints of a probability distribution. This is achieved by iteratively adjusting the kernel matrix until the marginal distributions of the rows and columns match the desired marginal distributions. We provided a comprehensive empirical analysis of the proposed kernel method to evaluate its goodness with greater depth. The suggested method is assessed for drug subcategory prediction (classification task) and solubility AlogPS ``Aqueous solubility and Octanol/Water partition coefficient" (regression task) using the benchmark SMILES string dataset. The outcomes show the proposed method outperforms several baseline methods in terms of supervised analysis and has potential uses in molecular design and drug discovery. Overall, the suggested method is a promising avenue for kernel methods-based molecular structure analysis and design.

artificial intelligence, machine learning, smile string, (17 more...)

arXiv.org Artificial Intelligence

2412.14717

Country:

Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Alameda County > San Leandro (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models

Nezhad, Sina Bagheri, Agrawal, Ameeta, Pokharel, Rhitabrat

arXiv.org Artificial IntelligenceDec-16-2024

Multilingual language models (MLLMs) are crucial for handling text across various languages, yet they often show performance disparities due to differences in resource availability and linguistic characteristics. While the impact of pre-train data percentage and model size on performance is well-known, our study reveals additional critical factors that significantly influence MLLM effectiveness. Analyzing a wide range of features, including geographical, linguistic, and resource-related aspects, we focus on the SIB-200 dataset for classification and the Flores-200 dataset for machine translation, using regression models and SHAP values across 204 languages. Our findings identify token similarity and country similarity as pivotal factors, alongside pre-train data and model size, in enhancing model performance. Token similarity facilitates cross-lingual transfer, while country similarity highlights the importance of shared cultural and linguistic contexts. These insights offer valuable guidance for developing more equitable and effective multilingual language models, particularly for underrepresented languages.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.125

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)

Add feedback

A Comparative Study of Hyperparameter Tuning Methods

Dasgupta, Subhasis, Sen, Jaydip

arXiv.org Artificial IntelligenceAug-29-2024

The study emphasizes the challenge of finding the optimal trade-off between bias and variance, especially as hyperparameter optimization increases in complexity. Through empirical analysis, three hyperparameter tuning algorithms Tree-structured Parzen Estimator (TPE), Genetic Search, and Random Search are evaluated across regression and classification tasks. The results show that nonlinear models, with properly tuned hyperparameters, significantly outperform linear models. Interestingly, Random Search excelled in regression tasks, while TPE was more effective for classification tasks. This suggests that there is no one-size-fits-all solution, as different algorithms perform better depending on the task and model type. The findings underscore the importance of selecting the appropriate tuning method and highlight the computational challenges involved in optimizing machine learning models, particularly as search spaces expand.

algorithm, dataset, hyperparameter, (16 more...)

arXiv.org Artificial Intelligence

2408.16425

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Oceania > Guam (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(2 more...)

Add feedback

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

Yin, Chun, Chi, Tai-Shih, Tsao, Yu, Wang, Hsin-Min

arXiv.org Artificial IntelligenceJun-12-2024

Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance in assessing speaker voice similarity. Experimental results on the Voice Conversion Challenge 2018 and 2020 datasets show that SVSNet+ incorporating WavLM representations shows significant improvements compared to baseline models. In addition, while fine-tuning WavLM with a small dataset of the downstream task does not improve performance, using the same dataset to learn a weighted-sum representation of WavLM can substantially improve performance. Furthermore, when WavLM is replaced by other SFMs, SVSNet+ still outperforms the baseline models and exhibits strong generalization ability.

linear layer, representation, svsnet, (16 more...)

arXiv.org Artificial Intelligence

2406.08445

Country:

Asia > Taiwan (0.05)
South America > Colombia > Meta Department > Villavicencio (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)

Add feedback

PRIME: A CyberGIS Platform for Resilience Inference Measurement and Enhancement

Mandal, Debayan, Zou, Dr. Lei, Wilkho, Rohan Singh, Abedin, Joynal, Zhou, Bing, Cai, Dr. Heng, Baig, Dr. Furqan, Gharaibeh, Dr. Nasir, Lam, Dr. Nina

arXiv.org Artificial IntelligenceApr-15-2024

In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, there is a lack of computationally rigorous, user-friendly tools that can support customized resilience assessment considering local conditions. This study aims to address these gaps through the power of CyberGIS with three objectives: 1) To develop an empirically validated disaster resilience model - Customized Resilience Inference Measurement designed for multi-scale community resilience assessment and influential socioeconomic factors identification, 2) To implement a Platform for Resilience Inference Measurement and Enhancement module in the CyberGISX platform backed by high-performance computing, 3) To demonstrate the utility of PRIME through a representative study. CRIM generates vulnerability, adaptability, and overall resilience scores derived from empirical hazard parameters. Computationally intensive Machine Learning methods are employed to explain the intricate relationships between these scores and socioeconomic driving factors. PRIME provides a web-based notebook interface guiding users to select study areas, configure parameters, calculate and geo-visualize resilience scores, and interpret socioeconomic factors shaping resilience capacities. A representative study showcases the efficiency of the platform while explaining how the visual results obtained may be interpreted. The essence of this work lies in its comprehensive architecture that encapsulates the requisite data, analytical and geo-visualization functions, and ML models for resilience assessment.

adaptability, resilience, vulnerability, (14 more...)

arXiv.org Artificial Intelligence

2404.09463

Country:

North America > United States > California (0.14)
North America > Mexico (0.14)
North America > United States > Oregon (0.04)
(23 more...)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Modeling Large-Scale Walking and Cycling Networks: A Machine Learning Approach Using Mobile Phone and Crowdsourced Data

Saberi, Meead, Lilasathapornkit, Tanapon

arXiv.org Artificial IntelligenceApr-2-2024

Walking and cycling are known to bring substantial health, environmental, and economic advantages. However, the development of evidence-based active transportation planning and policies has been impeded by significant data limitations, such as biases in crowdsourced data and representativeness issues of mobile phone data. In this study, we develop and apply a machine learning based modeling approach for estimating daily walking and cycling volumes across a large-scale regional network in New South Wales, Australia that includes 188,999 walking links and 114,885 cycling links. The modeling methodology leverages crowdsourced and mobile phone data as well as a range of other datasets on population, land use, topography, climate, etc. The study discusses the unique challenges and limitations related to all three aspects of model training, testing, and inference given the large geographical extent of the modeled networks and relative scarcity of observed walking and cycling count data. The study also proposes a new technique to identify model estimate outliers and to mitigate their impact. Overall, the study provides a valuable resource for transportation modelers, policymakers and urban planners seeking to enhance active transportation infrastructure planning and policies with advanced emerging data-driven modeling methodologies.

count data, study area, training and testing, (15 more...)

arXiv.org Artificial Intelligence

2404.00162

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine (1.00)
Telecommunications (0.93)
Government > Regional Government > Oceania Government > Australia Government (0.69)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback